{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8c1348d3-4c0e-450f-8faf-19503f61b7b2",
   "metadata": {},
   "source": [
    "# LlamaCloud Demo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57082f55-66e0-44e1-8072-2450405c21d1",
   "metadata": {},
   "source": [
    "## Step 0: Setup environment config for LlamaCloud"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e83a35ec-8e6c-475c-827c-20f46c4a3c43",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "013c59e6-34b2-4685-b8c3-10b9e9afc3a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"llx-\"\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9472b5e3-c203-410d-82a3-b19c7c0ed61b",
   "metadata": {},
   "source": [
    "## Step 1: Parse pdf with LlamaParse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "4f713c84-2774-45c8-b030-a572273724db",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_parse import LlamaParse\n",
    "from llama_index.core import SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "e83d26e5-05ea-4bac-a06c-987e06993f1a",
   "metadata": {},
   "outputs": [],
   "source": [
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\",  # \"markdown\" and \"text\" are available\n",
    "    num_workers=4,\n",
    "    verbose=True,\n",
    "    language=\"en\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "6414cbff-8b05-4599-bce6-5ce764600a40",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started parsing the file under job_id 837b9595-bc68-4c02-85b2-ed76acb2f59b\n"
     ]
    }
   ],
   "source": [
    "file_extractor = {\".pdf\": parser}\n",
    "reader = SimpleDirectoryReader(\n",
    "    input_files=['data_resnet/resnet.pdf'], \n",
    "    file_extractor=file_extractor\n",
    ")\n",
    "docs = reader.load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c25b8955-b6f7-43c2-8ba4-33a07fca0e2c",
   "metadata": {},
   "source": [
    "## Step 2: Build cloud index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "ba1896de-c850-44d2-8e79-cb75a04d669c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.indices.managed.llama_cloud import LlamaCloudIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "5d879e25-3ffc-4b90-9858-f9f3f6c83851",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Find your index at https://cloud.llamaindex.ai/project/c4bd96da-ba73-4572-b400-4fb2e53b2a95/deploy/e24629a3-5ca5-401e-b673-c3bb22874034\n"
     ]
    }
   ],
   "source": [
    "index = LlamaCloudIndex.from_documents(\n",
    "    name='resnet_0226',\n",
    "    documents=docs,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8375bb94-3386-4a21-9f6e-a31c7ff55e5b",
   "metadata": {},
   "source": [
    "## Step 3: Use your retrieval endpoint "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7b5cb21-8f98-490d-9468-cb30e12c3d83",
   "metadata": {},
   "source": [
    "If you have a reference to the index: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "df510141-13be-4285-9fab-1906c56dc71c",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = index.as_retriever(rerank_top_n=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "63457127-5a86-4b78-81e9-848a07c37362",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 11 ms, sys: 10.7 ms, total: 21.7 ms\n",
      "Wall time: 1.34 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "nodes = retriever.retrieve('how is the result in ImageNet detection task?')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "8c3de876-fad7-41e4-b965-fb93b3909fd1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Node ID: 8a3280f3-c9e9-48c3-b9fb-77127c619cd8\n",
      "Text: This **result** won the 1st place in the **ImageNet**\n",
      "**detection** task in ILSVRC 2015, surpassing the second place by 8.5\n",
      "points (absolute).  ## **ImageNet** Localization  The **ImageNet**\n",
      "Localization (LOC) task [36] requires to classify and localize the\n",
      "objects. Following [40, 41], we assume that the image-level\n",
      "classifiers are first adopted...\n",
      "Score:  0.997\n",
      "\n",
      "Node ID: 8228859b-0e0c-4828-96ee-85ad71c2a3e7\n",
      "Text: Under this setting, the results are an mAP@.5 of 55.7% and an\n",
      "mAP@[.5, .95] of 34.9% (Table 9). This is our single-model **result**.\n",
      "Ensemble. In Faster R-CNN, the system is designed to learn region\n",
      "proposals and also object classifiers, so an ensemble can be used to\n",
      "boost both tasks. We use an ensemble for proposing regions, and the\n",
      "union set ...\n",
      "Score:  0.992\n",
      "\n",
      "Node ID: 189314cd-84ae-426f-8280-cd26ca8b79ac\n",
      "Text: **Detection** results on the PASCAL VOC 2007 test set. The\n",
      "baseline is the Faster R-CNN system. The system “baseline+++” include\n",
      "box refinement, context, and multi-scale testing in Table 9.  |system|\n",
      "net|data|mAP|areo|bike|bird|boat|bottle|bus|car|cat|chair|cow|table|do\n",
      "g|horse|mbike|person|plant|sheep|sofa|train|tv|\n",
      "|---|---|---|---|---|---|---|-...\n",
      "Score:  0.948\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for node in nodes:\n",
    "    print(node)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ba5001d-fcdf-4888-8940-0e7db5372df4",
   "metadata": {},
   "source": [
    "Alternatively, you can directly create a retriever:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "4e4c651b-a1d0-40b0-91ce-a7a96fd701fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.indices.managed.llama_cloud import LlamaCloudRetriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "2796361d-90cd-423c-9726-f638a7aefbbc",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = LlamaCloudRetriever(\n",
    "    name='resnet_0226',\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "828c11dd-8b40-4262-a4c3-f0345b81a522",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%time\n",
    "nodes = retriever.retrieve('what is Deep Residual Learning?')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "58688348-fa6f-4daf-8d95-20604c6f5a54",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Node ID: e74dac90-e4f1-4344-87b3-7b35af1c52f6\n",
      "Text: **Residual** learning: a building block.  are comparably good or\n",
      "better than the constructed solution (or unable to do so in feasible\n",
      "time).  In this paper, we address the degradation problem by\n",
      "introducing a **deep** **residual** learning framework. Instead of\n",
      "hoping each few stacked layers directly fit a desired underlying\n",
      "mapping, we explicit...\n",
      "Score:  0.999\n",
      "\n",
      "Node ID: 5b856dc3-886e-4e2f-8720-c4720c892403\n",
      "Text: ## **Deep** **Residual** Learning for Image Recognition  Kaiming\n",
      "He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research\n",
      "arXiv:1512.03385v1 [cs.CV] 10 Dec 2015 {kahe, v-xiangz, v-shren,\n",
      "jiansun}@microsoft.com 20 20  ### Abstract  Deeper neural networks are\n",
      "more difficult to train. We present a **residual** learning framework\n",
      "to ease the train...\n",
      "Score:  0.998\n",
      "\n",
      "Node ID: 9018f86f-647f-4f58-8836-799733664df5\n",
      "Text: These methods suggest that a good reformulation or\n",
      "preconditioning can simplify the optimization.  Shortcut Connections.\n",
      "Practices and theories that lead to shortcut connections have been\n",
      "studied for a long time. An early practice of training multi-layer\n",
      "perceptrons (MLPs) is to add a linear layer connected from the network\n",
      "input to the output. ...\n",
      "Score:  0.961\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for node in nodes:\n",
    "    print(node)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
